智能论文笔记

Machine Learning Emulation of 3D Cloud Radiative Effects

David Meyer , Robin J. Hogan , Peter D. Dueben , Shannon L. Mason

分类：机器学习

2021-03-22

在数值天气和气候模型中的云结构的处理通常很大程度上是大大简化的，以使它们计算得起价格实惠。在这里，我们建议使用计算廉价的神经网络来纠正欧洲的中等天气预报1D辐射方案ECRAD，用于3D云效应。 3D云效应被学习为ECRAD快速1D Tripleclouds疏忽它们的差异及其3D Spartacus（通过云侧辐射传输的快速算法），其中包括它们的求解器，但大约是计算昂贵的五倍。在3D信号的20到30％之间的典型误差，神经网络的准确性提高了运行时增加约1％。因此，而不是模仿整个斯巴达斯，我们将Tripleclouds保持不变的气氛的无云部分和在其他地方的3D矫正它。如果我们假设两者的相似的信噪比，则对相对小的3D校正而不是整个信号的焦点允许显着提高预测。

translated by 谷歌翻译

Doubly Robust Kernel Statistics for Testing Distributional Treatment Effects Even Under One Sided Overlap

Jake Fawkes , Robert Hu , Robin J. Evans , Dino Sejdinovic

分类： (统计)机器学习 | 机器学习

2022-12-09

As causal inference becomes more widespread the importance of having good tools to test for causal effects increases. In this work we focus on the problem of testing for causal effects that manifest in a difference in distribution for treatment and control. We build on work applying kernel methods to causality, considering the previously introduced Counterfactual Mean Embedding framework (\textsc{CfME}). We improve on this by proposing the \emph{Doubly Robust Counterfactual Mean Embedding} (\textsc{DR-CfME}), which has better theoretical properties than its predecessor by leveraging semiparametric theory. This leads us to propose new kernel based test statistics for distributional effects which are based upon doubly robust estimators of treatment effects. We propose two test statistics, one which is a direct improvement on previous work and one which can be applied even when the support of the treatment arm is a subset of that of the control arm. We demonstrate the validity of our methods on simulated and real-world data, as well as giving an application in off-policy evaluation.

translated by 谷歌翻译

SODA: A Natural Language Processing Package to Extract Social Determinants of Health for Cancer Studies

Zehao Yu , Xi Yang , Chong Dang , Prakash Adekkanattu , Braja Gopal Patra , Yifan Peng , Jyotishman Pathak , Debbie L. Wilson , Ching-Yuan Chang , Wei-Hsuan Lo-Ciganic

分类：自然语言处理 | 人工智能 | 机器学习

2022-12-06

Objective: We aim to develop an open-source natural language processing (NLP) package, SODA (i.e., SOcial DeterminAnts), with pre-trained transformer models to extract social determinants of health (SDoH) for cancer patients, examine the generalizability of SODA to a new disease domain (i.e., opioid use), and evaluate the extraction rate of SDoH using cancer populations. Methods: We identified SDoH categories and attributes and developed an SDoH corpus using clinical notes from a general cancer cohort. We compared four transformer-based NLP models to extract SDoH, examined the generalizability of NLP models to a cohort of patients prescribed with opioids, and explored customization strategies to improve performance. We applied the best NLP model to extract 19 categories of SDoH from the breast (n=7,971), lung (n=11,804), and colorectal cancer (n=6,240) cohorts. Results and Conclusion: We developed a corpus of 629 cancer patients notes with annotations of 13,193 SDoH concepts/attributes from 19 categories of SDoH. The Bidirectional Encoder Representations from Transformers (BERT) model achieved the best strict/lenient F1 scores of 0.9216 and 0.9441 for SDoH concept extraction, 0.9617 and 0.9626 for linking attributes to SDoH concepts. Fine-tuning the NLP models using new annotations from opioid use patients improved the strict/lenient F1 scores from 0.8172/0.8502 to 0.8312/0.8679. The extraction rates among 19 categories of SDoH varied greatly, where 10 SDoH could be extracted from >70% of cancer patients, but 9 SDoH had a low extraction rate (<70% of cancer patients). The SODA package with pre-trained transformer models is publicly available at https://github.com/uf-hobiinformatics-lab/SDoH_SODA.

translated by 谷歌翻译

Efficient brain age prediction from 3D MRI volumes using 2D projections

Johan Jönemo , Muhammad Usman Akbar , Robin Kämpe , J Paul Hamilton , Anders Eklund

分类：计算机视觉 | 机器学习

2022-11-10

Using 3D CNNs on high resolution medical volumes is very computationally demanding, especially for large datasets like the UK Biobank which aims to scan 100,000 subjects. Here we demonstrate that using 2D CNNs on a few 2D projections (representing mean and standard deviation across axial, sagittal and coronal slices) of the 3D volumes leads to reasonable test accuracy when predicting the age from brain volumes. Using our approach, one training epoch with 20,324 subjects takes 40 - 70 seconds using a single GPU, which is almost 100 times faster compared to a small 3D CNN. These results are important for researchers who do not have access to expensive GPU hardware for 3D CNNs.

translated by 谷歌翻译

Data-adaptive Transfer Learning for Translation: A Case Study in Haitian and Jamaican

Nathaniel R. Robinson , Cameron J. Hogan , Nancy Fulda , David R. Mortensen

分类：自然语言处理

2022-09-13

多语言转移技术通常改善低资源机器翻译（MT）。这些技术中的许多是不考虑数据特征的情况下应用的。我们在海地对英语翻译的背景下显示，转移效率与知识共享语言之间的培训数据和关系数量相关。我们的实验表明，对于超出真实数据阈值的某些语言，反向翻译的增强方法是适得其反的，而从足够相关的语言中的跨语言转移则是优选的。我们通过贡献了基于规则的法国人行曲拼字和句法引擎以及一种新颖的语音嵌入方法来补充这一发现。当与多语言技术一起使用时，拼字法转换使对常规方法的统计学显着改善。在非常低的牙买加MT中，用传输语言进行矫正相似的代码转换可产生6.63的BLEU点优势。

translated by 谷歌翻译

Tree-based Subgroup Discovery In Electronic Health Records: Heterogeneity of Treatment Effects for DTG-containing Therapies

Jiabei Yang , Ann W. Mwangi , Rami Kantor , Issa J. Dahabreh , Monicah Nyambura , Allison Delong , Joseph W. Hogan , Jon A. Steingrimsson

分类： (统计)机器学习

2022-08-30

电子健康记录（EHR）可获得的丰富纵向个体水平数据可用于检查治疗效果异质性。但是，使用EHR数据估算治疗效果提出了几个挑战，包括时变的混杂，重复和时间不一致的协变量测量，治疗分配和结果以及由于辍学导致的损失。在这里，我们开发了纵向数据（SDLD）算法的亚组发现，该算法是一种基于树的算法，用于使用纵向相互作用树算法结合使用纵向相互作用的一般数据驱动的方法，与纵向驱动的方法与纵向驱动的方法结合使用纵向相互作用，以发现具有异质治疗效果的亚组，并进行纵向研究。目标最大似然估计。我们将算法应用于EHR数据，以发现患有人免疫缺陷病毒（HIV）的人群的亚组，他们在接受非Dolutegravir抗逆转录病毒疗法（ART）接受非Dolutegravir抗逆转录病毒疗法（艺术）时的体重增加风险较高。

translated by 谷歌翻译

HTML版本

Where is VALDO? VAscular Lesions Detection and segmentatiOn challenge at MICCAI 2021

Carole H. Sudre , Kimberlin Van Wijnen , Florian Dubost , Hieab Adams , David Atkinson , Frederik Barkhof , Mahlet A. Birhanu , Esther E. Bron , Robin Camarasa , Nish Chaturvedi

分类：计算机视觉 | 人工智能

2022-08-15

脑小血管疾病的成像标记提供了有关脑部健康的宝贵信息，但是它们的手动评估既耗时又受到实质性内部和间际变异性的阻碍。自动化评级可能受益于生物医学研究以及临床评估，但是现有算法的诊断可靠性尚不清楚。在这里，我们介绍了\ textIt {血管病变检测和分割}（\ textit {v textit {where valdo？}）挑战，该挑战是在国际医学图像计算和计算机辅助干预措施（MICCAI）的卫星事件中运行的挑战（MICCAI） 2021.这一挑战旨在促进大脑小血管疾病的小而稀疏成像标记的自动检测和分割方法的开发，即周围空间扩大（EPVS）（任务1），脑微粒（任务2）和预先塑造的鞋类血管起源（任务3），同时利用弱和嘈杂的标签。总体而言，有12个团队参与了针对一个或多个任务的解决方案的挑战（任务1 -EPVS 4，任务2 -Microbleeds的9个，任务3 -lacunes的6个）。多方数据都用于培训和评估。结果表明，整个团队和跨任务的性能都有很大的差异，对于任务1- EPV和任务2-微型微型且对任务3 -lacunes尚无实际的结果，其结果尤其有望。它还强调了可能阻止个人级别使用的情况的性能不一致，同时仍证明在人群层面上有用。

translated by 谷歌翻译

Evolution and trade-off dynamics of functional load

Erich Round , Rikker Dockum , Robin J. Ryder

分类：自然语言处理

2021-12-22

功能负载（FL）通过口碑对与lexicon制作的区别的贡献来定量贡献。以前的研究与声音变化有特别低的曲线。在这里，我们将探究范围扩大到FL，以其所有价值观的演变。我们应用系统发育方法，以检查澳大利亚帕玛尼蒙（PN）家族的90种语言的FL的历复演变。我们在FL中发现了高度的系统发育信号。虽然已经报告了系统发育信号进行语音结构，例如语音术，但其在语音功能测量中的检测是新颖的。我们还在元音长度和以下辅音的FL之间发现了一个重要的负相关，即深入的历史权衡动态，我们与现代PN语言中的已知阿拉孔和过去的补偿声音变化相关。该发现揭示了一种类似于翻蛋白的历史动态，我们作为音韵子系统之间的对比流动。我们的发现在跨越整个大陆和多千年的语言系列中，我们的发现提供了Sapir'漂移'假设的最具令人讨厌的例子之一，在历史相关的语言中不小心平行的发展。

translated by 谷歌翻译

Machine Learning Emulation of Urban Land Surface Processes

David Meyer , Sue Grimmond , Peter Dueben , Robin Hogan , Maarten van Reeuwijk

分类：机器学习

2021-12-21

我们可以通过机器学习（ml）改善城市陆地面积的建模吗？在预测所有常见表面通量的情况下，城市陆地表面模型（ULSMS）的比较发现，没有单一模型是“最好”。在这里，我们开发了一个城市神经网络（UNN），在一个网站上的22个ULSMS的平均预测助焊剂训练。UNN准确地模拟ULSMS的平均输出。与参考ulsm（城镇能量平衡; TEB）相比，UNN相对于通量观察，计算成本较少，并且需要较少的输入参数具有更高的准确性。当使用TensoRFlow绑定耦合到天气研究预测（WRF）模型时，WRF-UNN比参考WRF-TEB稳定，更准确。虽然申请目前受到培训数据（1个网站）的限制，但我们展示了一种新的方法来通过将几个ULSMS的强度与使用ML的强度组合成一个方法来改善表面助熔剂的建模。

translated by 谷歌翻译

NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation

Kaustubh D. Dhole , Varun Gangal , Sebastian Gehrmann , Aadesh Gupta , Zhenhao Li , Saad Mahamood , Abinaya Mahendiran , Simon Mille , Ashish Srivastava , Samson Tan

分类：自然语言处理 | 人工智能 | 机器学习

2021-12-06

数据增强是自然语言处理（NLP）模型的鲁棒性评估的重要组成部分，以及增强他们培训的数据的多样性。在本文中，我们呈现NL-Cogmenter，这是一种新的参与式Python的自然语言增强框架，它支持创建两个转换（对数据的修改）和过滤器（根据特定功能的数据拆分）。我们描述了框架和初始的117个变换和23个过滤器，用于各种自然语言任务。我们通过使用其几个转换来分析流行自然语言模型的鲁棒性来证明NL-Upmenter的功效。基础架构，Datacards和稳健性分析结果在NL-Augmenter存储库上公开可用（\ url {https://github.com/gem-benchmark/nl-augmenter}）。

translated by 谷歌翻译